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Abstract 



1 Introduction 



Strong replica consistency is often achieved by writing 
deterministic applications, or by using a variety of mech- 
anisms to render replicas deterministic. There exists a 
large body of work on how to render replicas determinis- 
tic under the benign fault model. However, when replicas 
can be subject to malicious faults, most of the previous 
work is no longer effective. Furthermore, the determin- 
ism of the replicas is often considered harmful from the 
security perspective and for many applications, their in- 
tegrity strongly depends on the randomness of some of 
their internal operations. This calls for new approaches 
towards achieving replica consistency while preserving 
the replica randomness. In this paper, we present two 
such approaches. One is based on Byzantine agreement 
and the other on threshold coin-tossing. Each approach 
has its strength and weaknesses. We compare the perfor- 
mance of the two approaches and outline their respective 
best use scenarios. 
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Strong replica consistency is an essential property for 
replication-based fault tolerant distributed systems. It 
can be achieved via a number of different techniques. 
In this paper, we investigate the challenges in achiev- 
ing integrity-preserving strong replica consistency and 
present our solutions for state-machine based Byzantine 
fault tolerant systems [3|. While it is widely known that 
strong replica consistency can also be achieved through 
the systematic-checkpointing technique lfl2ll for nondeter- 
ministic applications in the benign fault model, it is gen- 
erally regarded as too expensive and it is not suitable for 
Byzantine fault tolerance. 

The state-machine based approach is one of the funda- 
mental techniques in building fault tolerant systems 1161 . 
In this approach, replicas are assumed to be either de- 
terministic or rendered-deterministic. There has been a 
large body of work on how to render replicas determin- 
istic in the presence of replica nondeterminism under the 
benign fault model (e.g., [3 4l rT2l[T5l l). However, when 
the replicas can be subject to Byzantine faults, which is 
the case for many Internet-based systems, most of the 
previous work is no longer effective. Furthermore, the 
determinism (or rendered-determinism) of the replicas is 
often considered harmful from the security perspective 
(e.g., with replication, an adversary can compromise any 
of the replicas to obtain confidential information (6)) and 
for many applications, their integrity is strongly depen- 



dent on the randomness of some of their internal opera- 
tions (e.g., random numbers are used for unique identifier 
generation in transactional systems and for shuffling cards 
in online poker games, and if the randomness is taken 
away by a deterministic algorithm to ensure replica con- 
sistency, the identifiers or the hands of cards can be made 
predictable, which can easily lead to exploit |fl9l ED). 
This calls for new approaches towards achieving strong 
replica consistency while preserving the randomness of 
each replica's operations. 

In this paper, we present two alternative approaches 
towards our goal. The first one is based on Byzantine 
agreement [3 j (referred to as the BA-algorithm in this pa- 
per) and the other on a threshold coin-tossing scheme [2] 
(referred to as the CT-algorithm). Both approaches rely 
on a collective determination for decisions involving ran- 
domness, and the determination is based on the contri- 
butions made by a set of replicas (at least one of which 
must be correct), to avoid the problems mentioned above. 
They differ mainly by how the collective determination 
is carried out. In the BA-algorithm, the replicas first 
reach a Byzantine agreement on the set of contributions 
from replicas, and then apply a deterministic algorithm 
(for all practical purposes, the bitwise exclusive-or oper- 
ation EH) to compute the final random value. The CT- 
algorithm uses the threshold coin-tossing scheme intro- 
duced in [2 1 to derive the final random value, without 
the need of a Byzantine agreement step. Even though 
the CT-algorithm saves on communication cost, it does 
incur significant computation overhead due to the CPU- 
intensive exponentiation calculations. Consequently, as 
we will show in Section [7] the BA-algorithm performs 
the best in a Local-Area Network (LAN) environment, 
where the CT-algorithm is more appropriate for the Wide- 
Area Network (WAN) environment where message pass- 
ing is expensive. Furthermore, to ensure the freshness 
of the random numbers generated, the replicas using the 
BA-algorithm should have access to high entropy sources 
(which is relatively easy to satisfy) and the replicas should 
be able to refresh their key shares periodically in the CT- 
algorithm. For the latter, we envisage that a proactive 
threshold signature scheme could be used UK] [8]. How- 
ever, the discussion of proactive threshold signature tech- 
niques is out of the scope of this paper. 

To summarize, we make the following research contri- 
butions in this paper: 



• We point out the danger and pitfalls of control- 
ling replica randomness for the purpose of ensur- 
ing replica consistency. Removing randomness from 
replica operations (when it is needed) could seriously 
compromise the system integrity. 

• We propose the use of collective determination of 
random numbers contributed from replicas, as a 
practical way to reconcile the requirement of strong 
replica consistency and the preservation of replica 
randomness. 

• We present a light-weight, Byzantine agreement 
based algorithm to carry out the collective determi- 
nation. The BA-algorithm only introduces two ad- 
ditional communication steps because the Byzantine 
agreement for the collective determination of ran- 
dom numbers can be integrated into that for mes- 
sage total ordering, as needed by the state-machine 
replication. The BA-algorithm is particularly suited 
for Byzantine fault tolerant systems operating in the 
LAN environment, or where replicas are connected 
by high-speed low-latency networks. 

• We further present an algorithm that uses the thresh- 
old coin-tossing scheme Q as an alternative method 
for collective determination of random numbers. The 
coin-tossing scheme is introduced in [2] as an instru- 
mental mechanism for a group of replicas to reach 
Byzantine agreement in asynchronous systems. To 
the best of our knowledge, our work is the first 
to show its usefulness in helping to ensure strong 
replica consistency without compromising the sys- 
tem integrity. 

• We conduct extensive experiments, in both a LAN 
testbed and an emulated WAN environment, to thor- 
oughly characterize the performance of the two ap- 
proaches. 

2 Byzantine Fault Tolerance 

In this section, we introduce the system model for our 
work, and the practical Byzantine fault tolerance algo- 
rithm (BFT algorithm, for short) developed by Castro and 
Liskov [3 1 as necessary background information. 
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Byzantine fault tolerance refers to the capability of a 
system to tolerate Byzantine faults. It can be achieved by 
replicating the server and by ensuring that all server repli- 
cas reach an agreement on the total ordering of clients' 
requests despite the existence of Byzantine faulty repli- 
cas and clients. Such an agreement is often referred to as 
Byzantine agreement ifTTl . 

In recent several years, a number of efficient Byzantine 
agreement algorithms [3., 9, 20 1 have been proposed. In 
this work, we focus on the BFT algorithm and use the 
same system model as that in 0. 

The BFT algorithm operates in an asynchronous dis- 
tributed environment. The safety property of the algo- 
rithm, i.e., all correct replicas agree on the total ordering 
of requests, is ensured without any assumption of syn- 
chrony. However, to guarantee liveness, i.e., for the algo- 
rithm to make progress towards the Byzantine agreement, 
certain synchrony is needed. Basically, it is assumed that 
the message transmission and processing delay has an 
asymptotic upper bound. This bound is dynamically ex- 
plored in the algorithm in that each time a view change 
occurs, the timeout for the new view is doubled. 

The BFT algorithm is executed by a set of 3/ + 1 repli- 
cas to tolerate up to / Byzantine faulty replicas. One of 
the replicas is designated as the primary while the rest are 
backups. Each replica is assigned a unique id i, where i 
varies from to 3/. For view v, the replica whose id i 
satisfies i = v mod (3/ + 1) would serve as the primary. 
The view starts from 0. For each view change, the view 
number is increased by one and a new primary is selected. 

The normal operation of the BFT algorithm involves 
three phases. During the pre-prepare phase, the primary 
multicasts a pre-prepare message containing the client's 
request, the current view and a sequence number assigned 
to the request to all backups. A backup verifies the re- 
quest and the ordering information. If the backup ac- 
cepts the pre-prepare message, it multicasts a prepare 
message containing the ordering information and the di- 
gest of the request being ordered. This starts the prepare 
phase. A replica waits until it has collected 2/ match- 
ing prepare messages from different replicas, and the pre- 
prepare message, before it multicasts a commit message 
to other replicas, which starts the commit phase. The 
commit phase ends when a replica has collected 2/ + 1 
matching commit messages from different replicas (pos- 
sibly including the one sent or would have been sent by 



itself). At this point, the request message has been totally 
ordered and it is ready to be delivered to the server appli- 
cation once all previous requests have been delivered. 

All messages exchanged among the replicas, and those 
between the replicas and the clients are protected by an 
authenticator |3| (for multicast messages), or by a mes- 
sage authentication code (MAC) (for point-to-point com- 
munications). An authenticator is formed by a number of 
MACs, one for each target of the multicast. We assume 
that the replicas and the clients each has a public/private 
key pair, and the public keys are known to everyone. 
These keys are used to generate symmetric keys needed 
to produce/verify authenticators and MACs. To ensure 
freshness, the symmetric keys are periodically refreshed 
by the mechanism described in [3 |. We assume that the 
adversaries have limited computing power so that they 
cannot break the security mechanisms described above. 

Furthermore, we assume that a faulty replica cannot 
transmit the confidential state, such as the random num- 
bers collectively determined, to its colluding clients in 
real time. This can be achieved by using an application- 
level gateway, or a privacy firewall as described by Yin et 
al.[20|, to filter out illegal replies. A compromised replica 
may, however, replace a high entropy source to which it 
retrieves random numbers with a deterministic algorithm, 
and convey such an algorithm via out-of-band or covert 
channels to its colluding clients. 

3 Pitfalls in Controlling Replica 
Randomness 

In this section, we analyze a few well-known approaches 
possibly be used to ensure replica consistency in the pres- 
ence of replica randomness. We show that they are not 
robust against Byzantine faulty replicas and clients. 

For replicas that use a pseudo-random number genera- 
tor, they can be easily rendered deterministic by ensuring 
that they use the same seed value to initialize the gen- 
erator. One might attempt to use the sequence number 
assigned to the request as the seed. Even though this 
approach is perhaps the most economical way to render 
replicas deterministic (since no extra communication step 
is needed and no extra information is to be included in 
the control messages for total ordering of requests), it vir- 
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tually takes the randomness away from the fault tolerant 
systems. In the presence of Byzantine clients, the vulner- 
ability can be exploited to compromise the integrity of the 
system. For example, a Byzantine faulty client in an on- 
line poker game can simply try out different integer values 
as the seed to the pseudo-random generator (if it is known 
to the client) to guess the hands of the cards in the dealer 
and compare with the ones it has gotten. The client can 
then place its bets accordingly and gain unfair advantage. 

A seemingly more robust approach is to use the times- 
tamp as the seed to the pseudo-random number genera- 
tor. As shown in Ifl9l |2D . the use of timestamp does 
not offer more robustness to the system because it can 
also be guessed by Byzantine faulty clients. Furthermore, 
the use of timestamp imposes serious challenges in asyn- 
chronous distributed systems because of the requirement 
that all replicas must use the same timestamp to seed the 
pseudo-random number generator. In a mechanism is 
proposed to handle this problem by asking the primary 
to piggyback its timestamp, to be used by backups as 
well, with the pre-prepare message. However, the issue 
is that the backups have very limited ways of verifying 
the timestamp proposed (other than that the timestamp 
must be monotonically increasing) without resorting to 
strong synchrony assumptions (such as bounds on pro- 
cessing and message passing). 

The only option remaining seems to be the use of a truly 
random number to seed the pseudo-random number gen- 
erator (or to obtain random numbers entirely from a high 
entropy source). We note that the elegant mechanism de- 
scribed in O cannot be used in this case because backups 
have no means to verify whether the number proposed by 
the primary is taken from a high-entropy source, or is gen- 
erated according to a deterministic algorithm. If the latter 
is the case, the Byzantine faulty primary could continue 
colluding with Byzantine faulty clients without being de- 
tected. 

Therefore, we believe the most effective way in coun- 
tering such threats is to collectively determine the random 
number, based on the contributions from a set of replicas 
so that Byzantine faulty replicas cannot influence the final 
outcome. The set size depends on the algorithms used, 
as we will show in the next two sections, but it must be 
greater than the number of faulty replicas tolerated (/) by 
the system. 
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Figure 1 : Normal operation of the BA-algorithm. 



4 The BA-Algorithm 

The normal operation of the BA-algorithm is illustrated 
in Figure Q] As can be seen, the collective-determination 
mechanism is seamlessly integrated into the original BFT 
algorithm. On ordering a request, the primary deter- 
mines the order of the request (i.e., assigns a sequence 
number to the request), and queries the application for 
the type of operation associated with the request. If the 
operation involves with a random number as input, the 
primary activates the mechanism for the BA-algorithm. 
The primary then obtains its share of random number by 
extracting from its own entropy source, and piggybacks 
the share with the pre-prepare message multicast to all 
backups. The pre-prepare message has the form ^RE- 
PREPARE,?;, n, d, R p >a p , where v is the view number, n 
is the sequence number assigned to the request, d is the 
digest of the request, R p is the random number generated 
by the primary, and a p is the authenticator for the mes- 
sage. 

On receiving the pre-prepare message, a backup per- 
forms the usual chores such as the verification of the au- 
thenticator before it accepts the message. It also checks 
if the request will indeed trigger a randomized opera- 
tion, to prevent a faulty primary from putting unneces- 
sary loads on correct replicas (which could lead to a de- 
nial of service attack). If the pre-prepare message is 
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acceptable, the replica creates a pre-prepare certificate 
for storing the relevant information, generates a share of 
random number from its entropy source, and multicasts 
to all replicas a pp-update message, in the form <PP- 
UPDATE,w, n, i, Ri, d>cti, where i is the sending replica 
identifier, Ri is the random number contributed by replica 
i. 

When the primary has collected 2/ pp-update mes- 
sages, it combines the random numbers received accord- 
ing to a deterministic algorithm (referred to as the entropy 
combination step in Figure [T), and builds a pp-update 
message with slightly different content than those sent by 
backups. In the pp-update message sent by the primary, 
the Ri component is replaced by a set of 2/ + 1 tuples 
containing the random numbers contributed by replicas 
(possibly including its own share), Sr. Each tuple has 
the form <Ri,i>. The replica identifier is included in the 
tuple to ease the verification of the set at backups. 

On receiving a pp-update message, a backup accepts 
the message and stores the message in its data structure 
provided that the message has a correct authenticator, it is 
in view v and it has accepted a pre-prepare message to or- 
der the request with the digest d and sequence number n. 
A backup proceeds to the entropy combination step only if 
(1) it has accepted a pp-update message from the primary, 
and (2) 2/ pp-update messages sent by the replicas refer- 
enced in the set Sr. The backup requests a retransmission 
from the primary for any missing pp-update message. 

After the entropy combination step is completed, 
a backup multicasts a prepare message in the form 
<PREPAREu, n, i, d'>a.i, where d' is the digest of the re- 
quest concatenated by the combined random number. 

When a replica has completed the entropy combination 
step, and it has collected 2/ valid prepare messages from 
different replicas (possibly including the message sent or 
would have been sent by itself), it multicasts to all replicas 
a commit message in the form <COMMlTi>, n, i, d'>cti. 
When a replica receives 2f + 1 valid commit messages, 
it decides on the sequence number and the collectively 
determined random number. At the time of delivery to the 
application, both the request and the random number are 
passed to the application. 

In Figure Q] the duration of the entropy extraction and 
combination steps have been intentionally exaggerated 
for clarify. In practice, the entropy combination can be 
achieved by applying a bitwise exclusive-or operation on 
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Figure 2: Normal operation of the CT-algorithm. 



the set of random numbers collected, which is very fast. 
The cost of entropy extraction depends on the scheme 
used. Some schemes, such as the TrueRand method ifTOl , 
allows very prompt entropy extraction. TrueRand works 
by gathering the underlying randomness from a computer 
by measuring the drift between the system clock and the 
interrupts-generation rate on the processor. 

5 The CT- Algorithm 

The normal operation of the CT-algorithm is shown in 
Figure [2] The CT-algorithm is the same as the BFT al- 
gorithm in the first two phases (i.e., pre-prepare and pre- 
pare phases). The commit phase is modified by incor- 
porating threshold coin-tossing operations. Most existing 
(k,l) threshold signature schemes IU |5l H El H31 can be 
used for the CT-algorithm, where k is the threshold num- 
ber of signature shares needed to produce the group sig- 
nature, and I = 3/ + 1 is the total number of players 
(i.e., replicas in our case) participating the threshold sign- 
ing. In most (k, I) threshold signature schemes, a correct 
group signature can be derived by combining shares from 
k = t + 1 players, where t = f is the maximum num- 
ber of corrupted players tolerated. Some schemes, such 
as the RSA-based scheme in [ 14], allow the flexibility of 
using up to k = I — t as the minimum number of shares 
required to produce the group signature. Since I = 3/ + 1 
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in our work, k can be set as high as 2f + 1. This prop- 
erty offers additional protection against Byzantine faulty 
replicas |[T4ll . 

At the beginning of the commit phase, each replica gen- 
erates its share of threshold signature by signing <d||n> 
using its private key share, where d is the digest of 
the request message and n is the sequence number as- 
signed to the request. This operation is referred to as the 
share-generation step in Figure [2] The signature share 
is piggybacked with the commit message, in the form 
<COMMiTw,n, i,d, T(d\\n,i)>cti, where T(d\\n,i) is 
the replica i's share of threshold signature. 

When a replica has collected 2/+1 valid commit mes- 
sages from different replicas, it executes the shares- 
combination step by combining k threshold signature 
shares piggybacked with the commit messages. After 
the shares have been combined into a group signature, 
it is mapped into a random number, first by hashing the 
group signature with a secure hash function (e.g., SHA1), 
and then by taking the first group of most significant bits 
from the hash according to the type of numbers needed, 
e.g., 32bits. The random number will be delivered to- 
gether with the request to the application, when all pre- 
vious requests have been delivered. 

6 Informal Proof of Correctness 

In this section, we provide an informal argument on the 
correctness of our two algorithms. The correctness crite- 
ria for the algorithms are: 

CI All correct replicas deliver the same random num- 
ber to the application together with the associated 
request, and 

C2 The random number is secure (i.e., it is truly random) 
in the presence of up to / Byzantine faulty replicas. 

We first argue for the BA-algorithm. CI is guaranteed 
by the use of Byzantine agreement algorithm. C2 is en- 
sured by the collection of 2/ + 1 shares contributed by 
different replicas, and by a sound entropy combination al- 
gorithm (e.g., by using the bitwise exclusive-or operation 
on the set to produce the combined random number). By 
collecting 2/ + 1 contributions, it is guaranteed that at 
least / + 1 of them are from correct replicas, so faulty 



replicas cannot completely control the setQ The entropy 
combination algorithm ensures that the combined random 
number is secure as long as at least one share is secure. 
The bitwise exclusive-or operation could be used to com- 
bine the set and it is provably secure for this purpose ED . 
Therefore, the BA-algorithm satisfies both CI and C2. 

Next we argue for the CT-algorithm. CI is guaranteed 
by the following fact: (1) The same message (<d||n>) 
is signed by all correct replicas, according to the CT- 
algorithm. (2) The threshold signature algorithm guaran- 
tees the production of the same group signature by com- 
bining k shares. Different replicas could obtain different 
set of k shares and yet they all lead to the same group sig- 
nature. (3) The same secure hash function is used to hash 
the group signature. C2 is guaranteed by the threshold 
signature algorithm. For the threshold signature algorithm 
used in our implementation, its security is ensured by the 
random oracle model 11141 . Therefore, the CT-algorithm 
is correct as well. This completes our proof. 

7 Performance Characterization 

The BA-algorithm and the CT-algorithm have been imple- 
mented and incorporated into a Java-based BFT frame- 
work. The Java-based BFT framework is developed in 
house and it is ported from the C++ based BFT frame- 
work of Castro and Liskov (3). Due to space limita- 
tion, the details of the framework implementation is omit- 
ted. The CT-algorithm uses Shoup's threshold signature 
scheme [ 14 1, implemented by Steve Weis and made avail- 
able at SourceForge [ 18 1. 

The development and test platform consists of a group 
of Dell SC440 servers each is equipped with a PentiumD 
processor of 2.8GHz and 1GB of RAM running SuSE 
10.2 Linux. The nodes are connected via a 100Mbps 
LAN. As we noted earlier, the WAN experiments are em- 
ulated by introducing artificial delays in communication, 
without injecting message loss. 

To character the cost of the two algorithms, we use 
an echo application with fixed lKB-long requests and 
replies. The server is replicated at four nodes, and hence, 

'The use of / + 1 shares are all that needed for this purpose. How- 
ever, collecting more shares is more robust in cases when some correct 
replicas use low-entropy sources. This is analogous to the benefit of 
Shoup's threshold signature scheme 1141 . 
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Operation 
Type 


Signing 
/Generation 


Verification 
/Combination 


MAC 


24.1 us 


237.3 [is 


Authenticator 


80.2^s 


892.0/is 


CT2-64 


2.2ms 


4.6ms 


CT2-128 


7.1ms 


12.8ms 


CT2-256 


31.7ms 


58.5ms 


CT2-512 


179.1ms 


338.2ms 


CT2-1024 


1191.7ms 


1381.4ms 


CT3-64 


2.2ms 


5.6ms 


CT3-128 


7.1ms 


18.5ms 


CT3-256 


31.7ms 


80.0ms 


CT3-512 


179.1ms 


449.7ms 


CT3-1024 


1191.7ms 


2292.1ms 



Table 1: Execution time for basic cryptographic opera- 
tions involved with our algorithms. The data shown for 
CT signing is for a single share. 



/ = 1 in all our measurements. Up to 12 concurrent 
clients are launched across the remaining nodes (at most 
one client per node). Each client issues consecutive re- 
quests without any think time. For the CT-algorithm, 
we vary a number of parameters, including the threshold 
value and the key length. We also experiment with certain 
optimizations. For all measurements, the end-to-end la- 
tency is measured at the client and the throughput is mea- 
sured at the replicas. The Java System . nanoTime ( ) 
API is used for all timing-related measurements. 

7.1 Cost of Cryptographic Operations 

We first report the mean execution latency of basic crypto- 
graphic operations involved in the BA-algorithm and the 
CT-algorithm because such information is beneficial to 
the understanding of the behaviors we observe. The la- 
tency cost is obtained when running a single client and 4 
server replicas in the LAN testbed. The results are sum- 
marized in Table Q] As can be seen, the threshold signa- 
ture operations are quite expensive, and it is impractical 
to use a key as large as 1024bit-long. 

Without any optimization (and without fault), an end- 
to-end remote call from a client to the replicated server 
using the original BFT algorithm involves a total of 4 



authenticator generation operations (A g ), 5 authenticator 
verification operations (A v ) (one does not need to ver- 
ify the message sent by itself), 1 MAC generation oper- 
ation (M g ) and 2 MAC verification operation (M v ) on the 
critical execution path (i.e., A g + A v for request send- 
ing and receiving, A g + A v for the pre-prepare phase, 
A g + A V for the prepare phase, A g + 2A V for the commit 
phase, and M g + 2M V for the reply sending and receiv- 
ing). The BA-algorithm introduces two additional com- 
munication steps and 2 A g and 3 A v on the critical path. 
The CT-algorithm does not require any additional com- 
munication step, but introduces 1 threshold signing oper- 
ation (T s ) and 1 operation for threshold shares verification 
and combination (T v ). From this analysis, the minimum 
end-to-end latency achievable using the BA-algorithm is 
L™« = QA g + 8A V + M g + 2M V (a replica can pro- 
ceed to the next step as soon as it receives 1 valid prepare 
message from other replica in the prepare phase, and 2 
valid commit messages from other replicas in the com- 
mit phase, and the client can proceed to deliver the reply 
as soon as it has gotten 2 consistent replies). Similarly, 
the minimum latency using the CT-algorithm is L™™ = 
AAg + 5A V + Mg + 2M V +T S +T V . Based on the values 
given in Table Q] = 8.1ms and L%fi = 12.1ms 

for k = 2 and 64bit-long key. The minimum overhead in- 
curred by the BA-algorithm is 2A g + 3A V = 2.8ms and 
that by the CT-algorithm is T s +T v = 6.8ms for k = 2 
and 64bit-long key. 

7.2 LAN Experimental Results 

Figure [3] shows the summary of the experimental results 
obtained in the LAN testbed. The end-to-end latency 
(plotted in log-scale) measured at a single client under 
various configurations is shown in Figure Ha). As a ref- 
erence, the latency for the BFT system without the ad- 
ditional mechanisms described in this paper is shown as 
"Base". In the figure, the result for the BA-algorithm is 
shown as "BA", and the results for the CT-algorithm with 
different parameter settings are labeled as CT#-i, where # 
is the k value, and i is the key length. As can be seen, 
only if a very short key is used, the CT-algorithm incurs 
significant overhead. Furthermore, the observed end-to- 
end latency results are in-line with the analysis provided 
in the previous subsection. 

The throughput measurement results shown in Fig- 
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Figure 3: LAN measurement results, (a) End-to-end 
latency under various configurations, (b) The system 
throughput in the presence of different number of con- 
current clients, (c) End-to-end latency as a function of the 
load on the system (throughput). 

ure [3jb) are consistent with those in the end-to-end 
latency measurements. The results labeled with "no 
batching" are obtained for the original CT-algorithm de- 
scribed in Section [5] i.e., one coin-tossing operation 
(i.e., threshold share signing, combination and verifica- 
tion of k shares) is used for every request. Those labeled 
with "with batching" are measured when the requests are 
batched (for total ordering, they all share the same se- 
quence number [3|) and only one coin-tossing operation 
is used for the entire batch of requests. As can be seen 
from Figure|3lb), the gain in throughput is significant with 
the batching optimization. However, if sharing the same 
random number among several requests is a concern, this 



optimization must be disabled. 

For the BA-algorithm, the communication steps for 
reaching a Byzantine agreement on the set of random 
numbers are automatically batched together with that for 
requests total-ordering. Batching the Byzantine agree- 
ment for a set of random numbers does not seem to in- 
troduce any vulnerability. The additional optimization of 
one set of entropy extraction and combination per batch of 
requests does not have any noticeable performance bene- 
fit. Therefore, it is advised that this further optimization 
not to be considered in practice due to possible security 
concerns. 

Figure 0c) shows the end-to-end latency as a function 
of the load on the system in the presence of concurrent 
clients. We use the system throughput as a metric for the 
system load because it better reflects the actual load on 
the system than the number of clients. It is also useful 
to compare with the results in the WAN experiments. As 
can be seen, for the CT-algorithm, without the batching 
optimization, the latency increases very sharply with the 
load, due to the CPU intensive threshold signature com- 
putations. 

The results for the CT-algorithm with keys larger than 
64bits are omitted in Figure |3b) and (c) to avoid clutter- 
ing. The throughput is significantly lower and the end-to- 
end latency is much higher than those of the BA-algorithm 
in these configurations, especially when the load is high. 

7.3 WAN Experimental Results 

The experimental results obtained in an emulated WAN 
environment are shown in Figure|4] The observed metrics 
and the parameters used are identical to those in the LAN 
experiments. As can be seen in Figure HI a), the end-to- 
end latency as perceived by a single client is similar for 
the BA-algorithm and the CT-algorithm with a key size up 
to 256bits (for either k = 2 or k = 3). This can be easily 
understood because the end-to-end latency is dominated 
by the communication delays, as indicated by the end-to- 
end latency for the base system included in the figure. 

Figure Ub) shows part of the measurement results on 
system throughput under different number of concurrent 
clients. To avoid cluttering, only the results for k = 2 and 
key sizes of up to 256bits are shown . The throughput for 
the base system is included as a reference. As can be seen, 
when batching for the coin-tossing operation is enabled, 
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Figure 4: Emulated WAN measurement results, (a) End- 
to-end latency under various configurations, (b) The sys- 
tem throughput in the presence of different number of 
concurrent clients, (c) End-to-end latency as a function 
of the load on the system (throughput). 

the CT-algorithm with short-to-medium sized keys out- 
performs the BA-algorithm. When batching is disabled, 
however, the BA-algorithm performs better unless very 
small key is used for the CT-algorithm. The end-to-end 
latency results shown in Figure|4jc) confirm the trend. 



8 Related Work 



importance of the use of good random numbers has long 
been recognized in building secure systems |fl9l , we have 
yet to see substantial research work on how to preserve 
the randomized operations necessary to ensure the system 
integrity in a fault tolerant system. For the type of sys- 
tems where the use of random numbers is crucial to their 
service integrity, the benign fault model is obviously inad- 
equate and the Byzantine fault model must be employed 
if fault tolerance is required. 

In the recent several years, significant progress has 
been made towards building practical Byzantine fault tol- 
erant systems, as shown in the series of seminal papers 
such as [3] |4j |9l HQ] . This makes it possible to address 
the problem of reconciliation of the requirement of strong 
replica consistency and the preservation of each replica's 
randomness for real-world applications that requires both 
high availability and high degree of security. We believe 
the work presented in this paper is an important step to- 
wards solving this challenging problem. 

We should note that some form of replica nondeter- 
minism (in particular, replica nondeterminism related to 
timestamp operations) has been studied in the context 
Byzantine fault tolerant systems EH). However, we have 
argued in previous sections that the existing approach is 
vulnerable to the presence of colluding Byzantine faulty 
replicas and clients. 

The main idea of this work, i.e., collective determina- 
tion of random values based on the contributions made 
by the replicas, is borrowed from the design principles 
for secure communication protocols flTl . However, the 
application of this principle in solving the strong replica 
consistency problem is novel. 

The CT-algorithm is inspired by the work of Cachin, 
Kursawe and Shoup [2 |, in particular, the idea of exploit- 
ing threshold signature techniques for agreement. How- 
ever, we have adapted this idea to solve a totally differ- 
ent problem, i.e., it is used towards reaching integrity- 
preserving strong replica consistency. Furthermore, we 
carefully studied what to sign for each request so that the 
final random number obtained is not vulnerable to attacks. 



How to ensure strong replica consistency in the presence 
of replica nondeterminism has been of research interest 
for a long time, especially for fault tolerant systems using 
the benign fault model [3] 0] [T2] [T5) . However, while the 



9 Conclusion and Future Work 

In this paper, we presented our work on reconciling the re- 
quirement of strong replica consistency and the desire of 
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maintaining each replica's individual randomness. Based 
on the central idea of collective determination of ran- 
dom values needed by the applications for their service 
integrity, we designed and implemented two algorithms. 
The first one, the BA-algorithm, is based on reaching a 
Byzantine agreement on a set of random number shares 
provided by 2f + 1 replicas. The second one, the CT- 
algorithm, is based on threshold signature techniques. We 
thoroughly characterized the performance of the two al- 
gorithms in both a LAN testbed and an emulated WAN 
environment. We show that the BA-algorithm in general 
out-performs the CT-algorithm in most cases except in 
WAN operations under relatively light load. Furthermore, 
the overhead incurred by the BA-algorithm with respect 
to the base BFT system is relatively small, making it pos- 
sible for practical use. 

Future research work will focus on the threshold key 
share refreshment issue for the CT-algorithm. To en- 
sure long-term robustness of the system, the key shares 
must be proactively refreshed periodically. Otherwise, the 
random numbers generated this way may age over time, 
which may open the door for attacks. The threshold signa- 
ture algorithm used in this work 1 14 1 does not have built- 
in mechanism for key share refreshment. We will explore 
other threshold signature algorithms that offer this capa- 
bility fl]|7j[U. 
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